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ABSTRACT 


Feature extraction is important step to extract the useful and valuable 
information from the electromyography (EMG) signal. However, the process 
of feature extraction requires prior knowledge and expertise. In this paper, 
a featureless EMG pattern recognition technique is proposed to tackle the 
feature extraction problem. Initially, spectrogram is employed to transform 
the raw EMG signal into time-frequency representation (TFR). The TFRs or 
spectrogram images are then directly fed into the convolutional neural 
network (CNN) for classification. , Two CNN models are proposed to learn 
the features automatically from the spectrogram images without the need of 
manual feature extraction. The proposed CNN models are evaluated using 


Electromyogr aphy the EMG data acquired from the publicly access NinaPro database. 
Pattern recognition Our results show that CNN classifier can offer the best mean classification 
Spectrogram accuracy of 88.04% for the recognition of the hand and wrist movements. 
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1, INTRODUCTION 

In recent days, electromyography (EMG) has shown its potential in biomedical, clinical, myoelectric 
prosthetic and rehabilitation applications. Generally, EMG signal is one of the biological signals that 
measures the potential difference generated by the muscle fibre during muscle contraction [1]. Additionally, 
EMG signal provides rich muscle and motor information, which allows the myoelectric prosthetic control for 
nearly 40 years [2]. In this regard, many researchers have paying high attentions in analyzing the EMG signal 
for controlling the electrical powered prosthetic [3]. An EMG based prosthetic hand helps the amputee and 
stroke patients to restore their limbs functionality [2], [4]. Despite the advance in machine learning, 
the development of multifunctional myoelectric prosthetic has been increased. However, an accurate 
multifunction prosthetic is still a challenging task in the real time applications [4], [5]. 

Machine learning has recently drawn the attention of researchers in EMG pattern recognition. 
By applying machine learning algorithm, the performance of myoelectric prosthetic has shown great 
improvement [6]. Nevertheless, feature extraction is a critical step for achieving a good recognition rate in 
myoelectric control. In the previous studies, time domain (TD), frequency domain (FD) and time-frequency 
(TF) features were widely used in EMG pattern recognition [7]-[10]. However, the selection and the number 
of features are mostly empirical and require expertise. Furthermore, the performances of features are often 
inconsistent, which leads to unsastisfactory classification results [11]-[13]. Hence, the deep convolutional 
neural network (CNN) is introduced to eliminate the need of manual feature extraction in this work. 

CNN is one of the popular deep learning methods that has been successfully applied in the 
classification of high dimensional data especially for image [14], [15]. In signal processing, the signal can be 
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transformed into time-frequency representation (TFR) using the time-frequency distribution (TFD). The TFD 
images, which can be used as the input on CNN for diagnosis, classification and identification. In one study, 
Wang et al. [12] diagnosed the fault detection by using a series of wavelet scalogram as an input for CNN 
without manual feature extraction. Furthermore, Verstraete et al. [16] made use of spectrogram, Hilbert- 
Huang Transform (HHT) and wavelet scalogram as the input in CNN for fault detection. From the previous 
works, it shows that TFD have been successfully implemented in CNN model for classification tasks. 

This study aims to investigate the performance of CNN model when classifying the basic hand and 
wrist movements. Firstly, the EMG data of 11 amputees and 10 healthy subjects are acquired from NinaPro 
databases. After that, spectrogram is applied to transform the EMG signals into TFRs or spectrogram images, 
and then fed into the CNN for recognition process. In this work, two CNN architectures have been proposed 
and tested on both healthy and amputee datasets. At last, the performances of proposed CNN models are 
discussed and validated with statistical analysis. In short, this article describes the performance of featureless 
EMG pattern recognition for the classification of multiple hand movement types. 


2. MATERIAL AND METHOD 
2.1. Materials 

In this work, the EMG data from the Non-Invasive Adaptive Prosthetics (NinaPro) project is used. 
Recently, NinaPro database has been successfully applied in EMG pattern recognition studies [17], [18]. 
In the present work, the EMG data of 17 hand and wrist movements (Exercise B) including thumb up (M1), 
extension of index, middle and flexion of others (M2), flexion of ring, little and extension of others (M3), 
thumb opposing base of little finger (M4), abduction of all fingers (M5), fingers flexed together in fist (M6), 
pointing index (M7), adduction of extended fingers (M8), wrist supination with axis at middle finger (M9), 
wrist pronation with axis at middle finger (M10), wrist supination with axis at little finger (M11), 
wrist pronation with axis at little finger (M12), wrist flexion (M13), wrist extension (M14), wrist radial 
deviation (M15), wrist ulnar deviation (M16) and wrist extension with closed hand (M17) are utilized [5]. 
The EMG signals are collected from NinaPro database 3 (DB3) and database 4 (DB4), which comprises of 11 
amputees and 10 healthy subjects, respectively [19], [5]. The EMG signal was sampled at 2 kHz, however in 
this work the signal was initially sub-sampled by a decimation factor of 2. In the experiment, each movement 
was performed for 5 seconds and followed by a resting state of 3 seconds. Subjects were instructed to repeat 
each movement types for six times. In total, 1224 EMG signals (17 hand movement types x 6 repetitions x 
12 channels) were collected from each subject. Moreover, the generalized likehood ratio algorithm was used 
in offline relabeling. Note that all the resting states are removed for DB3 and DB4. 


2.2. System Overview 

Figure | demonstrates the flow diagram of the proposed featureless EMG pattern recognition 
system. In the first step, the EMG signals are collected from DB3 and DB4. The sample EMG signal 
obtained from one subject was shown in Figure 2. After that, spectrogram is applied to transform the signal 
into TFR. Next, the TFRs (spectrogram images) are directly fed into the CNN for predicting the 17 hand and 
wrist movements. 


Predicting 17 hand 
EMG Spectrogram Time-Frequency Convolutional and wrist 
Signals Representation Neural Network movements 
Figure 1. Flow diagram of proposed featureless EMG pattern recognition system 


2.3. Spectrogram 

Spectrogram is the square magnitude of short time Fourier transform (STFT), and it is defined as the 
visual representation of the STFT in time-frequency plane [16]. In spectrogram, the x-axis represents the time 
domain while the y-axis 1s referred to the frequency domain. Spectrogram provides the relationship between 
the EMG signal and muscle characteristic, which describes the muscle behavior for different hand 
movements [20]. Mathematically, spectrogram can be represented as: 
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where x(t) is the input signal and w(t-t) is the Hanning window function. In this work, spectrogram with 128 
ms (128 samples) and 50% overlap (64 samples) is used. Additionally, spectrogram is computed using 128 
Fourier (FFT) points. 
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Figure 2. Sample EMG signal obtained from one subject 


In this framework, spectrogram generated from twelve different channels are combined into a series 
image. After that, the spectrogram images are resized to 64 x 64 pixel and fed into the CNN for the 
classification process. Figure 3 illustrates the sample spectrogram images for 17 hand and wrist movements 
obtained from one subject. The color scale represents the amplitude of signal energy. From Figure 3, one can 
see that different type of hand movement offered different kind of visual representation. Especially M13, 
which provides a significant difference in spectrogram image as compared to other movements. 





Figure 3. Sample spectrogram images for 17 hand and wrist movements 


2.4. Convolutional Neural Network 

Convolutional neural network (CNN) is one of the best supervised deep learning algorithm for the 
classification of high dimensional images [14]. Generally, CNN takes the spectrogram images as the input 
directly for the classification process, thus avoiding the high complexity of pre-processing and feature 
extraction [12]. Instead of manually feature extraction, CNN offers the automatic feature extraction using the 
concept of deep neural network. Basically, CNN comprises of three main layers namely convolutional layer, 
max pooling layer and fully connected layer. At the first stage, convolutional layer fixes up the units in a 
sequence of filters. The width and height of input image in each filter are convolved during the training 
process [21], [22]. For the second stage, the max pooling layer produces a non-linear sub-sampling. This sub- 
sampling aims to reduce the dimension of input, thus reducing the amount of features in the training network 
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[23]. In the final stage, the fully connected layer multiples the input by weight matrix and carries out the 
classification process [24], [25]. In short, convolutional and max pooling layers construct the features from 
the images while the classification process is executed by the fully connected layer. 


2.5. Proposed Model Architecture 

In this work, two CNN models are proposed for featureless EMG pattern recognition. Table | and 2 
outline the overview of CNN architecture | and 2, respectively. In Table 1, it shows that CNN architecture 1 
consisted of 5 layers. The first stage consisted of a convolutional layer of 32 feature maps of 3 x 3 size. It is 
followed by a max pooling layer of 2 x 2 size with stride of 2. The final stage including a fully connected 
layer and a Softmax layer. 

From Table 2, it 1s observed that CNN architecture 2 had an additional convolutional and max 
polling layers compared to CNN architecture |. As can be seen, CNN architecture 2 made use of the 
convolutional layers with 16 and 32 filters for first and second convolutional layers, respectively. 


Table 1. Overview of CNN Architecture | 
Layer Layer Type Filter Number Filter Size —_ Stride Size 


1 Convolutional a2 3x3 - 
2 Max Pooling 1 2x2 2 
3 Fully Connected 17 64 x 64 


Table 2. Overview of CNN Architecture 2 
Layer Layer Type Filter Number Filter Size —_ Stride Size 


1 Convolutional 16 3x3 - 
2 Max Pooling 1 2x2 2 
a) Convolutional 32 3x3 

4 Max Pooling 1 2x2 2 
5 Fully Connected 17 64 x 64 


For both CNN model 1 and 2, the convolutional layer is followed by a batch normalization layer and 
rectified linear units (ReLU) layer. Batch normalization layer is applied to normalize the input by calculating 
the mean and standard deviation for each mini batch. The input is then normalized so that it has zero mean 
and unit variance. The normalize activation can be expressed as: 


2 ay 2 
ne ete 


’ | 2 
a 


where UA is the mean, oA2 is the variance and e 1s referred to the property epsilon. On one side, rectified 
linear unit (ReLU) layer is defined as a non-linear function that acts as a half-wave rectifier for arranging the 
weighted sum [16]. The convolutional function can be written as: 


(m) _ (m) , (m) 
if - ReLU[ 4 Y_ {+B 


(3) 


where Yk-1 is the input for convolutional channel, * denotes convolutional manipulator, Wk(m) is 
the kernel filter weight, Bk(m) is the bias weight and ReLU can be represented as: 


Re LU(y) = max(y,0) (4) 


3. RESULTS AND ANALYSIS 

In this paper, the classification performances of two proposed CNN architectures are investigated. 
The performances of CNN on both healthy and amputee subjects are also discussed. Remarkably, the TFR 
images generated by the spectrogram are used as the input directly for the classification process. All the 
analysis is done in Matlab 9.3 using computer with processing Intel Core 15-3340 3.1 GHz with 8 GB 
Random Access Memory (RAM). In the training phase, a batch size of 20 1s applied to reduce the loss of the 
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cross entropy. The initial learning rate is set at 0.01 and the maximum epochs of training process 1s fixed at 
50. In order to eliminate the saturation information, the spectrogram images are converted into grayscale 
images. For performance evaluation, 6-fold cross validation is implemented. The data is randomly divided 
into six equal subsets and each subset is used to test in sequence. On one side, the remaining five subsets are 
used as the training samples. 


3.1. Performance Evaluation 

In order to investigate the effect of the number of convolutional layer on CNN model, two CNN 
architectures have been proposed and studied. Table 3 exhibits the classification accuracy of CNN model 1 
and 2 for both healthy and amputee subjects. It is observed that the performances of healthy subjects are 
better than amputee subjects. In essence, healthy subjects often have a better performance than amputee 
subjects. This is expected since amputees have the difficulty in performing the hand movements without the 
complete upper limb. What the amputees did in the experiment is that they performed the movements based 
on their imaginations. 


Table 3. Classification Accuracy of CNN Model | and 2 for Both Healthy and Amputee Subjects 


Classification Accuracy (%) 


Subject Healthy Subject Amputee Subject 
CNN Model 1 CNN Model 2 CNN Model 1 CNN Model 2 
1 85.29 88.24 76.47 79.41 
pi 89.22 83.33 72.5) 68.63 
3 85.29 88.24 38.24 45.10 
4 89.22 89.22 63.73 71.57 
5 86.27 86.27 35.29 34.31 
6 91.18 93.14 67.65 68.63 
Z 88.24 95.10 35.29 31.37 
8 85.29 91.18 69.61 76.47 
9 82.35 85.29 129 79.41 
10 79.41 80.39 30.39 35.29 
11 - - 69.61 300 
Mean 86.18 88.04 57.40 60.34 
STD 3.50 4.45 18.28 19.50 


As can be seen in Table 3, by applying CNN model 1, majority of healthy subject achieved the 
accuracy of above 80% except subject 10. On the contrary, all the healthy subjects obtained the accuracy of 
above 80% when CNN model 2 is employed. Inspecting the results, for both healthy and amputee datasets, 
CNN model 2 achieved the highest mean classification accuracy of 88.04% and 60.34%, respectively. 
By employing CNN model 2, the results showed an increment of 1.86% and 2.94% in mean classification 
accuracy for both healthy and amputee datasets. It is worth noting that CNN model 2 has an additional 
convolutional and max pooling layers. These finding indicates an increment of convolutional layer is able to 
enhance the quality of classification. This may be due to an additional process for automatic feature 
extraction in the CNN model. Based on the result obtained, it is clear that the selection of convolutional layer 
was equally important in designing a CNN architecture. Furthermore, the result of T-test shows that the 
classification performances of CNN model | and 2 are similar (p=0.1299) for healthy datasets. For amputee 
dataset, there is a significant difference (p=0.0484) between the classification performance of CNN 
model | and 2. 

Table 4 summarizes the computational cost of CNN model 1 and 2 for both healthy and amputee 
subjects. Note that the computational time in Table 4 is the averaged results across 11 amputees and 10 
healthy subjects. From Table 4, CNN model 2 spent nearly 6 minutes for the training and testing sessions. 
As compared to CNN model 1, CNN model 2 took extra one minute to complete the classification task. 
Obviously, CNN model 1 was less time consuming as compared to model 2. This means that an additional 
layer in CNN model increased the computational complexity. Considering the classification accuracy and 
computational cost, it is believed that CNN model 2 is more appropriate to be applied in featureless EMG 
pattern recognition. 


Table 4. Computational Cost of CNN Model | and 2 


Average Computational time (s) 


Subject CNN Modell CNN Model 2 
Healthy 295.09 337.42 
Amputee 2945.1 361.75 


Featureless EMG pattern recognition based on convolutional neural network (Jingwei Too) 


1296 O ISSN: 2502-4752 


In the final part of the evaluation, the mean class-wise accuracy of CNN model 2 for both healthy 
and amputee datasets are calculated as shown in Figure 4 and 5. From Figure 4, M1, M6, M9, M13, M14, 
M15, M16 and M17 are the most correct predicted (above 90%) while the worst confusion rate is falling on 
Ms (76.3%). From Figure 5, it is clear that the 17 hand and wrist movements performed by amputee subjects 
are difficult to recognize, especially M3 (51.5%). As can be seen, the best recognition rate is found to be M1 
(75.4%), followed by M6, (71.4%). 
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Figure 4. Confusion matrix of CNN model 2 across 10 healthy subjects (%) 
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Figure 5. Confusion matrix of CNN model 2 across 11 amputee subjects (%) 


4. CONCLUSION 

In current work, the performance of featureless EMG pattern recognition in classifying the 17 hand 
and wrist movements are presented. Two CNN architectures have been proposed for performance evaluation. 
In this framework, CNN took the spectrogram images directly for classification without the need of feature 
extraction. The performances of proposed CNN are evaluated using the EMG data of 11 amputees and 10 
healthy subjects. Our results showed that the CNN model 2 offered the highest mean classification accuracy 
of 88.04% for recognizing the 17 hand and wrist movements. It indicates that the additional convolutional 
layer can improve the classification results, but with extra computational cost. However, it is worth noting 
that the best classification accuracy is below 90%, which needs to be improved for real time application. 
In future, a CNN architecture should be designed deeper in order to ensure better classification performance. 
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